Optimal Enumeration: Efficient Top-k Tree Matching

نویسندگان

  • Lijun Chang
  • Xuemin Lin
  • Wenjie Zhang
  • Jeffrey Xu Yu
  • Ying Zhang
  • Lu Qin
چکیده

Driven by many real applications, graph pattern matching has attracted a great deal of attention recently. Consider that a twigpattern matching may result in an extremely large number of matches in a graph; this may not only confuse users by providing too many results but also lead to high computational costs. In this paper, we study the problem of top-k tree pattern matching; that is, given a rooted tree T , compute its top-k matches in a directed graph G based on the twig-pattern matching semantics. We firstly present a novel and optimal enumeration paradigm based on the principle of Lawler’s procedure. We show that our enumeration algorithm runs in O(nT + log k) time in each round where nT is the number of nodes in T . Considering that the time complexity to output a match of T is O(nT ) and nT ≥ log k in practice, our enumeration technique is optimal. Moreover, the cost of generating top-1 match of T in our algorithm is O(mR) where mR is the number of edges in the transitive closure of a data graph G involving all relevant nodes to T . O(mR) is also optimal in the worst case without preknowledge of G. Consequently, our algorithm is optimal with the running time O(mR + k(nT + log k)) in contrast to the time complexity O(mR log k+knT (log k+dT )) of the existing technique where dT is the maximal node degree in T . Secondly, a novel priority based access technique is proposed, which greatly reduces the number of edges accessed and results in a significant performance improvement. Finally, we apply our techniques to the general form of top-k graph pattern matching problem (i.e., query is a graph) to improve the existing techniques. Comprehensive empirical studies demonstrate that our techniques may improve the existing techniques by orders of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Exact Strring Matching Based on Suffix Arrays

Using the suffix tree of a string S, decision queries of the type “Is P a substring of S?” can be answered in O(|P |) time and enumeration queries of the type “Where are all z occurrences of P in S?” can be answered in O(|P |+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. Th...

متن کامل

Optimal exact string matching based on su x arrays

Using the su x tree of a string S, decision queries of the type \Is P a substring of S?" can be answered in O(jP j) time and enumeration queries of the type \Where are all z occurrences of P in S?" can be answered inO(jP j+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the su x tree are a severe drawback. The su ...

متن کامل

Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph

In this paper, we address the problem of enumerating all induced subtrees in an input k-degenerate graph, where an induced subtree is an acyclic and connected induced subgraph. A graph G = (V,E) is a k-degenerate graph if for any its induced subgraph has a vertex whose degree is less than or equal to k, and many real-world graphs have small degeneracies, or very close to small degeneracies. Alt...

متن کامل

Counter Strike: Generic Top-Down Join Enumeration for Hypergraphs

Finding the optimal execution order of join operations is a crucial task of today’s cost-based query optimizers. There are two approaches to identify the best plan: bottom-up and top-down join enumeration. But only the top-down approach allows for branchand-bound pruning, which can improve compile time by several orders of magnitude while still preserving optimality. For both optimization strat...

متن کامل

Efficient Enumeration of All Ladder Lotteries with k Bars

A ladder lottery, known as the “Amidakuji” in Japan, is a common way to choose an assignment randomly. Formally, a ladder lottery of a permutation π = (p1, p2, . . . , pn) is a network with n vertical lines (lines for short) and many horizontal lines (bars for short) as follows (see Fig. 1). The i-th line from the left is called line i. The top ends of the n lines correspond to π. The bottom en...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015